Introduction to ggplot2

Adam M. Wilson

October 2015

Today

  1. ggplot graphics
  2. spatial (vector) data with sp package

ggplot2

The grammar of graphics: consistent aesthetics, multidimensional conditioning, and step-by-step plot building.

  1. Data: The raw data
  2. geom_: The geometric shapes representing data
  3. aes(): Aesthetics of the geometric and statistical objects (color, size, shape, and position)
  4. scale_: Maps between the data and the aesthetic dimensions
data
+ geometry,
+ aesthetic mappings like position, color and size
+ scaling of ranges of the data to ranges of the aesthetics

Additional settings

  1. stat_: Statistical summaries of the data that can be plotted, such as quantiles, fitted curves (loess, linear models), etc.
  2. coord_: Transformation for mapping data coordinates into the plane of the data rectangle
  3. facet_: Arrangement of data into grid of plots
  4. theme: Visual defaults (background, grids, axes, typeface, colors, etc.)

alt text

alt text

Simple scatterplot

library(ggplot2)
p <- ggplot(mtcars, aes(x=wt, y=mpg))
p + geom_point()

Aesthetic map: color by # of cylinders

p + 
  geom_point(aes(colour = factor(cyl)))

Set shape using # of cylinders

p + 
  geom_point(aes(shape = factor(cyl)))

Adjust size by qsec

p + 
  geom_point(aes(size = qsec))

Color by cylinders and size by qsec

p + 
  geom_point(aes(colour = factor(cyl),size = qsec))

Multiple aesthetics

p + 
  geom_point(aes(colour = factor(cyl),size = qsec,shape=factor(gear)))

Add a linear model

p + geom_point() + 
  geom_smooth(method="lm")

Change scale color

p + geom_point(aes(colour = cyl)) + 
  scale_colour_gradient(low = "blue")

Change scale shapes

p + geom_point(aes(shape = factor(cyl))) + 
  scale_shape(solid = FALSE)

Set aesthetics to fixed value

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(colour = "red", size = 3)

Transparancy: alpha=0.2

d <- ggplot(diamonds, aes(carat, price))
d + geom_point(alpha = 0.2)

Varying alpha useful for large data sets

Transparancy: alpha=0.1

d + 
  geom_point(alpha = 0.1)

Transparancy: alpha=0.01

d + 
  geom_point(alpha = 0.01)

Building ggplots

alt text

Other Plot types

alt text

Your turn

Edit plot p above to include:

  1. points
  2. A smooth (‘loess’) curve
  3. a “rug” to the plot
p+
  geom_point()+
  geom_smooth()+
  geom_rug()

alt text

alt text

alt text

Discrete X, Continuous Y

p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_point()

Discrete X, Continuous Y + geom_jitter()

p + 
  geom_jitter()

Discrete X, Continuous Y + geom_violin()

p + 
  geom_violin()

Discrete X, Continuous Y + geom_violin()

p + 
  geom_violin() + geom_jitter(position = position_jitter(width = .1))

alt text

Three Variables

alt text

Will come back to this next week for raster package.

Stats

Visualize a data transformation

alt text

  • Each stat creates additional variables with a common ..name.. syntax
  • Often two ways: stat_bin(geom="bar") OR geom_bar(stat="bin")

alt text

2D kernel density estimation

Old Faithful Geyser Data on duration and waiting times.

library("MASS")
data(geyser)
m <- ggplot(geyser, aes(x = duration, y = waiting))

alt text photo: Greg Willis

See ?geyser for details.

m + 
  geom_point()

m + 
  geom_point() +  stat_density2d(geom="contour")

Check ?geom_density2d() for details

m + 
  geom_point() +  stat_density2d(geom="contour") +
  xlim(0.5, 6) + ylim(40, 110)

Update limits to show full contours. Check ?geom_density2d() for details

m + stat_density2d(aes(fill = ..level..), geom="polygon") + 
  geom_point(col="red")

Check ?geom_density2d() for details

alt text

Your turn

Edit plot m to include:

  • The point data (with red points) on top
  • A binhex plot of the Old Faithful data

Experiment with the number of bins to find one that works.

See ?stat_binhex for details

m + stat_binhex(bins=10) + 
  geom_point(col="red")

Specifying Scales

alt text

Discrete color: default

b=ggplot(mpg,aes(fl))+
  geom_bar( aes(fill = fl)); b

Discrete color: greys

b + scale_fill_grey( start = 0.2, end = 0.8, 
                   na.value = "red")

Continuous color: defaults

a <- ggplot(mpg, aes(hwy)) + 
  geom_dotplot( aes(fill = ..x..)); a

Continuous color: gradient

a +  scale_fill_gradient( low = "red", 
                          high = "yellow")

Continuous color: gradient2

a + scale_fill_gradient2(low = "red", high = "blue", 
                       mid = "white", midpoint = 25)

Continuous color: gradientn

a + scale_fill_gradientn(
  colours = rainbow(10))

Discrete color: brewer

b + 
  scale_fill_brewer( palette = "Blues")

colorbrewer2.org

alt text

ColorBrewer: Diverging

alt text

ColorBrewer: Filtered

alt text

Your turn

Edit the contour plot of the geyser data to use a sequential brewer palette:

m +
  stat_density2d(aes(fill = ..level..), geom="polygon") + 
  geom_point(col="red")

Note: usescale_fill_distiller() rather than scale_fill_brewer() for continuous data

m + stat_density2d(aes(fill = ..level..), geom="polygon") + 
  geom_point(size=.75)+
  scale_fill_distiller(palette="OrRd",#breaks=c(0.005,0.008,0.01),
                       name="Kernel\nDensity")+
      xlim(0.5, 6) + ylim(40, 110)+
  xlab("Eruption Duration (minutes)")+
  ylab("Waiting time (minutes)")

Or use geom=tile for a raster representation.

m + stat_density2d(aes(fill = ..density..), geom="tile",contour=F) + 
  geom_point(size=.75)+
  scale_fill_distiller(palette="OrRd",
                       name="Kernel\nDensity")+
      xlim(0.5, 6) + ylim(40, 110)+
  xlab("Eruption Duration (minutes)")+
  ylab("Waiting time (minutes)")

Axis scaling

Create noisy exponential data

set.seed(201)
n <- 100
dat <- data.frame(
    xval = (1:n+rnorm(n,sd=5))/20,
    yval = 10^((1:n+rnorm(n,sd=5))/20)
)

Make scatter plot with regular (linear) axis scaling

sp <- ggplot(dat, aes(xval, yval)) + geom_point()
sp

Example from R Cookbook

log10 scaling of the y axis (with visually-equal spacing)

sp + scale_y_log10()

Coordinate Systems

alt text

Position

alt text

Stacked bars

ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()

Dodged bars

ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")

Example Graphics from the NY Times

Baseball performance

alt text NY Times

Wealthiest 1%

alt text NY Times

Donors visit the White House

alt text NY Times

Themes

GGplot Themes

alt text

Quickly change plot appearance with themes.

More options in the ggthemes package.

library(ggthemes)

Or build your own!

Theme examples: default

p=ggplot(mpg, aes(x = cty, y = hwy, color = factor(cyl))) +
  geom_jitter() +
  labs(
    x = "City mileage/gallon",
    y = "Highway mileage/gallon",
    color = "Cylinders"
  )

Theme examples: default

p

Theme examples: Solarized

p + theme_solarized()

Theme examples: Solarized Dark

p +  theme_solarized(light=FALSE)

Theme examples: Excel

p + theme_excel() 

Theme examples: The Economist

p + theme_economist()

Faceting

Faceting

facet_wrap(): one variable

ggplot(mpg, aes(x = cty, y = hwy, color = factor(cyl))) +
  geom_jitter()+
  facet_wrap(~year)

facet_grid(): two variables

ggplot(mpg, aes(x = cty, y = hwy, color = factor(cyl))) +
  geom_jitter()+
  facet_grid(year~cyl)

Very useful for timeseries of spatial data.

Saving/exporting

Saving using the GUI

alt text

Saving using ggsave()

Save a ggplot with sensible defaults:

ggsave(filename, plot = last_plot(), scale = 1, width, height)

Saving using devices

Save any plot with maximum flexibility:

pdf(filename, width, height)  # open device
ggplot()                      # draw the plot(s)
dev.off()                     # close the device

Formats

  • pdf
  • jpeg
  • png
  • tif

and more…

Your turn: save a plot

  1. Save the p plot from above using png() and dev.off()
  2. Switch to the solarized theme with light=FALSE
  3. Adjust fontsize with base_size in the theme + theme_solarized(base_size=24)

Save a plot: Example 1

png("assets/test1.png",width=600,height=300)
p +  theme_solarized(light=FALSE)
dev.off()
## quartz_off_screen 
##                 2

alt text

Save a plot: Example 2

png("assets/test2.png",width=600,height=300)
p +  theme_solarized(light=FALSE, base_size=24)
dev.off()
## quartz_off_screen 
##                 2

alt text

GGPLOT2 Documentation

Perhaps R’s best documented package: docs.ggplot2.org

alt text

Colophon

Sources:

Licensing: